Linking Individuals Across Historical Sources: a Fully Automated Approach∗
نویسندگان
چکیده
Linking individuals across historical datasets relies on information such as name and age that is both non-unique and prone to enumeration and transcription errors. These errors make it impossible to find the correct match with certainty. We suggest a fully automated method for linking historical datasets that enables researchers to create samples that minimize type I (false positives) and type II (false negatives) errors. The first step of the method uses the ExpectationMaximization (EM) algorithm, a standard tool in statistics, to compute the probability that each two observations correspond to the same individual. The second step uses these estimated probabilities to determine which records to use in the analysis. We provide codes to implement this method.
منابع مشابه
Face Detection with methods based on color by using Artificial Neural Network
The face Detection methodsis used in order to provide security. The mentioned methods problems are that it cannot be categorized because of the great differences and varieties in the face of individuals. In this paper, face Detection methods has been presented for overcoming upon these problems based on skin color datum. The researcher gathered a face database of 30 individuals consisting of ov...
متن کاملA Supervised Learning and Group Linking Method for Historical Census Household Linkage
Historical census data provide a snapshot of the era when our ancestors lived. Such data contain valuable information that allows the reconstruction of households and the tracking of family changes across time, allows the analysis of family diseases, and facilitates a variety of social science research. One particular topic of interest in historical census data analysis are households and linki...
متن کاملA Hybrid Model for Linking Multiple Social Identities Across Heterogeneous Online Social Networks
Automated online profiling consists of the accurate identification and linking of multiple online identities across heterogeneous online social networks that correspond to the same entity in the physical world. The paper proposes a hybrid profile correlation model which relies on a diversity of techniques from different application domains, such as record linkage and data integration, image and...
متن کاملA 'historical case' of Ontology-Based Data Access
Historical research has steadily been adopting semantic technologies to tackle several recent problems in the field, such as making explicit the semantics contained in the historical sources, formalising them and linking them. Over the last decades, in social sciences and humanities an immense amount of new quantifiable data have been accumulated and made available in interchangeable formats, o...
متن کاملTemporal Text Ranking and Automatic Dating of Texts
This paper presents a novel approach to the task of temporal text classification combining text ranking and probability for the automatic dating of historical texts. The method was applied to three historical corpora: an English, a Portuguese and a Romanian corpus. It obtained performance ranging from 83% to 93% accuracy, using a fully automated approach with very basic features.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018